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(54) nUe: IDENTIFYING TOE ITEMS MOST RELEVANT TO A CURRENT QUERY BASED ON ITEMS SELECTED IN 
CONNECTION WITH SIMILAR QUERIES 



(57) Abstract 

The {Resent invention provides a software facility for identifying the items most- 
relevant to a cunent query based on items selected in connection witti similar queries. 
In preferred embodiments of the invention, the fecility receives a query specifiying one ot 
more query trarms. In rcsjxmse, the facility generates a query result identifying a plurality of 
items that satisfy the query. The facility dien produces a ranldng vahie for at least a portion 
of die items identified in the query result by ccmibining die relative frequencies with which 
users selected that item firom the query results generated fnnn queries specifying each of 
the terms specified by die query. The facility identifies as most relevant those items having 
die highest ranking values. 
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IDENTIFYING THE ITEMS MOST RELEVANT TO A CURRENT QUERY BASED 
ON ITEMS SELECTED IN CONNECTION WITH SIMILAR QUERIES 

TECHNICAL FIELD 
5 The present invention is directed to the field of query processing. 

BACKGROUND OF THE INVENTION 

Many World Wide Web sites pomit users to perform searches to 
identify a small number of interesting items among a much larger domain of items. As 
an example, several web index sites pennit users to search for particular web sites 

10 among most of the known web sites. Similarly, many online merchants, such as 
booksellers, permit users to se£uch for particular products among all of the products that 
can be purchased from a merchant. In many cases, users perform searches in order to 
ultimately find a single item within an entire domain of items. 

In order to perform a search^ a user submits a query containing one or 

IS more query toms. The query also explicitly or implicitly identifies a domain of items 
to search. For example, a user may submit a query to an online bookseller containing 
terms that the user believes are words in the title of a book. A query server program 
processes the qu^ to identify within the domain items matching the terms of the 
query. The it^ns identified by flie query sctvct program are collectively known as a 

20 query result In the example, the query result is a list of books whose tides contain 
some or all of the query terms. The query result is typically displayed to the user as a 
list of items. This list may be ordered in various ways. For example, the list may be 
ordered alphabetically or numerically based on a property of each item, such as the title, 
author, or release date of each book. As another example, the Ust may be ordered based 

25 on the extent to which each identified item matches the terms of the query. 

When the domain for a query contains a large number of items, it is 
common for query results to contain tens or hundreds of items. Where the usct is 
performing the search in order to find a single item, q)plication of conventional 
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approaches to ordering the query result often fail to place the sought item or items near 
the top of the query result, so that the user must read through many other items in the 
query result before reaching the sought item. In view of this disadvantage of 
conventional approaches to ordering query results, a new, more effective technique for 
5 automatically ordering qu^ results in accordance with collective and individual user 
behavior would have significant utility. 

Furttto',. it .is fairly common for users to specify queries that are not 
satisfied by any itCTis. This may happ^ for example, where a user submits a detailed 
query that is very narrow, or where a user mistypes or misr^embers a term in the 

10 query. In such cases, conventional techniques, which present only items that satisfy the 
qu^, present no items to the userl When no items are presented to a user in response 
to issuing a query, the user can become fiustrated wifli the search engine, and may even 
discontinue its use. Accordingly, a technique for di^laying items relating to at least 
some of the terms in a query even when no items completely match the query would 

IS have significant utility. 

In order to satisfy this need, some search engines adopt a strategy of 
effectively automatically revising the qu^ until a non-empty result set is produced. 
For example, a search engine may progressively delete conjunctive, Le., ANDed, terms 
fix)m a multiple traa quay until the result set produced for that query contains items. 

20 This strategy has the disadvantage that important information for choosing the correct 
items can be lost ^en queiy terms are arbitrarily deleted. As a result, the first non- 
empty result set can be quite large, and may contain a large percentage of items that are 
irrelevant to the original query as a whole. For this reason, a more effective technique 
for displaying items relating to at least some of the terms in a query even when no items 

25 completely match the query would have significant utihty. 

SUMMARY OF THE INVENTION 

The preset invention provides a software facihty (*Hhe facility") for 
identifying the items most relevant to a current query based on items selected in 
cormection with similar queries. The facility preferably generates ranking values for 
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items indicating their level of relevance to the current query, which specifies one or 
more query terms. The facility generates a ranking value for an item by combining 
rating scores, produced by a rating function, that each correspond to the level of 
relevance of the item to queries containing one of the ranking values. The rating 
S function preferably retrieves a rating score for the combination of an item and a term 
fix>m a rating table generated by the facility. The scores in the rating table preferably 
reflect, for a particular item and term, how often users have selected the item when the 
item has been identified in query results produced for queries containing particular 
term. 

10 In different embodiments, the facility uses the rating scores to either 

generate a ranking value for each item in a query result, or generate ranking values for a 
smaller number of items in order to select a few items having the top ranking values. 
To generate a ranking value for a particular item in a query result, the facility combines 
the rating scores corresponding to that item and the terms of the query. In embodiments 

15 in which the goal is to generate ranking values for each item in the query result, the 
facility preferably loops through the items in the query results and, for each item, 
combines all of the rating scores corresponding to that item and any of the terms in the 
query. On the oth^ hand, in onbodiments in which the goal is to select a few items in 
the query result having the largest ranking values, the facility preferably loops through 

20 the terms in the query, and, for each itrai, identifies the top few rating scores for that 
term and any it^. The fecility then combines the scores identified for each item to 
generate ranking values for a relatively small number of items, which may include it^s 
not identified in the query result Indeed, these embodiments of the invration are able 
to generate ranking values for and display items even in cases in which the query result 

25 is empty, when no items completely satisfy the query. 

Once the facility has generated ranking values for at least some items, 
the facility pref^ably orders the items of the query result in decreasing order of ranking 
value. The facility may also use the ranking values to subset the items in the query 
result to a smaller number of items. By ordering and/or subsetting the items in the 

30 query result in this way in accordance with collective and individual user behavior 
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lathCT than in accordance with attributes of the items, the facility substantially increases 
the likelihood that the user will quickly iSnd within the query result the particular item 
or items that he or she seeks. For example, while a query result , for a qu^ containing 
the query terms *1uraian" and "dynamic" may contain a book about himian dynamics 
S and a book about the effects on human beings of particle dynamics, selections by users 
fiom early query results produced for queries containing the term "human*' show that 
ihesG users select the human dynamics book much more frequently than they select the 
particle (fynamics book. The fecility therefore ranks the human dynamics book higher 
than the particle dynamics book, allowing usocs thsKt are more interested in the human 

10 dynamics book to select it more easily. This benefit of the facility is especially useful 
in conjunction with the large, heterogeneous query results that are typically generated 
for single-term queries, which are commonly submitted by users. 

Various embodiments of the invration base rating scores on different 
kinds of selection actions performed by the users on items identified in query results. 

IS These include whether the user displayed additional information about an item, how 
much time the user spent viewing the additional information about the item, how many 
hyperlinks the user followed within tibie additional information about &e item, whether 
the user added the item to his or her shopping basket, and whether the usct ultimately 
purchased the item. Embodiments of the invention also consider selection actions not 

20 relating to query results, such as typing an item's item identifier rather than choosing 
the item fix>m a query result Additional embodiments of the invention incorporate into 
the ranking process information about tfie uso* submitting the quay by maintaining and 
£^)plying separate rating scores for users in different demogrs^hic gn>iq)s, such as diose 
of die same sex, age, income, or geographic category. Certain embodiments also 

25 incorporate behavioral information about specific users. Furth^, rating scores may be 
produced by a rating fimction that combines different types of information reflecting 
collective and individual user preferences. Some embodiments of the invention utiUze 
specialized strategies for incorporating into the rating scores information about queries 
submitted in different time fiames. , 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a high*level block diagram showing the computer system 
upon which the facility preferably executes. 

Figure 2 is a flow diagram showing the steps preferably performed by 
5 thefacility in order to generate a new rating table. 

Figures 3 and 4 are table diagrams showing augmentation of an item 
rating table in accordance with step 206 (Figure 2). 

Figure 5 is a table diagram showing the generation of rating tables for 
composite periods of time fiom rating tables for constituent periods of time. 
10 Figure 6 is a table diagram showing a rating table for a composite period. 

Figure 7 is a flow diagram showing the steps preferably performed by 
the facility in order to identify user selections within a web server log. 

Figure 8 is a flow diagram showing the steps preferably performed by 
the facility to order a query result using a rating table by generating a ranking value for 
15 each item in the query result. 

Figure 9 is a flow diagram showing the steps preferably performed by 
the facility to select a few items in a qu^ result having the highest ranking values 
using a rating table. 

. DETAILED DESOUmON OF THE INVENTION 

20 The preset invention provides a software &cility (*the fecility") for 

identifying the items most relevant to a current query based on items selected in 
coimection with similar queries. The fedlity preferably generates ranking values for 
items indicating their level of relevance to the current query, which specifies one or 
more query terms. The facility generates a ranking value for an item by combining 

25 rating scores, produced by a rating function, that each correspond to the level of 
relevance of the item to queries containing one of the ranking values. The rating 
function preferably retrieves a rating score for the combination of an item and a term 
fiom a rating table generated by the fiicility. The scores in the rating table preferably 
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reflect, for a particular item and term, how often users have selected the item when the 
item has been idmtified in query results produced for queries containing the term. 

In different embodiments, the facility uses the rating scores to either 
goierate a ranking value for each item in a query result, or generate ranking values for a 
S smaller number of items in order to select a few it^ns having the top ranking values. 
To generate a ranking value for a particular item in a query result, the faciUty combines 
the rating scores corresponding to that item and the terms of the queiy. In embodimmts 
in which the goal is to generate ranking values for each item in the query result, the 
facility preferably loops through the items in the query results and, for each item, 

10 combines all of the rating scores corresponding to that item and any of the terms in the 
query. On the other hand, in embodiments in which the goal is to select a few items in 
the query result having the largest ranking values, the facility preferably loops through 
the terms in the query, and, for each item, identifies the top few rating scores for that 
term and any item. The facility then combines the scores identified for each item to 

15 generate ranking values for a relatively small number of items, which may include items 
not identified in the quay result. Indeed, these embodiments of the invention are able 
to generate ranking values for and display items even in cases in which the query result 
is empty, i.e., whrai no items compl^ely satisfy the quory. 

Once the facility has gsamted ranking values for at least some items, 

20 the facility preferably orda:s the items of the query result in decreasing order of ranking 
value. The faciUty may also use the ranking values to subset the items in the qu^ 
result to a smaller number of items. By ordmng and/or subsetting die items in Ae 
query result in this way in accordance with collective and individual user behavior 
rather than in accordance with attributes of the items, the fecility substantially mcreases 

25 the likelihood that the user will quickly find within the query result the particular item 
or items that he or she seeks. For example, while a query result for a query containing 
the query terms **human" and "dynamic" may contain a book about hiunan dynamics 
and a book about the efifects on human beings of particle dynamics, selections by users 
from early query results produced for queries containing the term *1iuman*' show that 

30 diese users select the human dynamics book much more firequently than they select the 



wo 99/45487 PCT/US98/26985 



particle dynamics book. The facility fberefore ranks the human dynandcs book higher 
than the particle dynamics book, allowing users, most of whom are more interested in 
the human dynamics book, to select it more easily. This benefit of the facility is 
especially useful in conjimction with the large, heterogeneous query results that are 
5 typically generated for single-tam queries, which are conunonly submitted by users. 

Various embodiments of the invention base rating scores on different 
kinds of selection actions performed by tfa^ wers on items identified in query results. 
These include whether the user displayed additional information about an item, how 
much time the user spent viewing the additional information about die item, how many 

10 hyperlinks the user followed within the additional information about the item, whether 
' flie user added the item to his or her shopping basket, and whether the user ultimately 
purchased tiie item. Embodiments of the invention also consider selection actions not 
relating to query results, such as typing an item's item identifier rather than choosing 
the item bom a query result. Additional embodiments of the invention incorporate into 

15 the ranking process information about the user submitting the query by maintaining and 
applying separate rating scores for us^ in different demographic groups, such as those 
of the same sex, age, income, or geogrEq>hic category. Certain embodinients also 
incorporate behavioral information about specific users. Further, rating scores may be 
produced by a ratmg Amotion that combines different types, of information reflecting 

20 collective and individual user preferences. Some embodiments of the invention utilize 
specialized strategies for incorporating into the rating scores information about queries 
submitted in differmt time fi:ames. 

Figure 1 is a high-level block diagram showing the computer system 
upon which the facility preferably executes. As shown in Figure 1, the computer 

25 system 100 comprises a central processing unit (CPU) 110, input/output devices 120, 
and a computer memory (memory) 130. Among the input/output devices is a storage 
device 121, such as a hard disk drive; a computer-readable media drive 122, which can 
be used to iostall software products, including the facility, which are provided on a 
computer-readable medium, such as a CD-ROM; and a network cormection 123 for 

30 cormection the computer system 100 to other computer systems (not shown). The 
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mmiory 130 preferably contains a query server 131 for generating query results ftom 
queries, a query result ranking facility 132 for automatically ranking tfie items in a 
query result in accordance with collective user preferences, and item rating tables 133 
used by the facility. While the facility is preferably implemented on a computer system 
S configured as described above, those skilled in the art will recognize that it may also be 
iinplCTiented on computer systems having different configurations. 

The facility pre£a^ly generates a new rating table periodically, and, 
when a query result is received, uses the last-generated rating table to rank the items in 
the query result. Figure 2 is a flow diagram showing die steps preferably performed by 

10 the facility in order to gena^e a new rating table. In step 201, the fedlity initializes a 
rating table for holding entries each indicating the rating score for a particular 
combination of a query term and an item identifier. The rating table preferably has no 
entries when it is initialized. In step 202, the facility identifies all of the query result 
item selections made by users dxuing the period of time for which the rating table is 

IS being generated. The rating table may be generated for the queries occurring during a 
period of time such as a day, a week, or month. This group of queries is termed a 
'"rating sef * of queries. The facility also identifies the terms of the queries that produced 
these query results in stqp 202. Performance of step 202 is discussed in greater detail 
below in conjunction witii Figure 7. In stq>s 204-208, the facility loops through each 

20 item selection firom a quay result that was made by a user during the time period. In 
stq> 204, the fecility identifies the tenns used in the query that produced the query result 
in which the item selection took place. In steps 205-207, the facility loops through each 
temi in the query. Jn stq> 206, the facility increases the rating score in tiie rating table 
corresponding to the currmt term and iteriL Where an entry does not yet exist in the 

25 rating table for the term and item, the facility adds a new entry to the rating table for the 
tOTn and item. Increasing the rating score preferably involves adding an increment 
value, such as 1, to the existing rating score for the term and item. In step 207, if 
additional terms remain to be processed, the facility loops back to step 205 to process 
the n^ term in the query, else die facility continues in step 208. In step 208, if 
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additional item selections remain to be processed, tiien the facility loops back to step 
203 to process die next item selection, else these steps conclude. 

Figures 3 and 4 are table diagrams showing augmentation of an item 
rating table in accordance with step 206 (Figure 2). Figure 3 shows the state of the item 
5 rating table before its augmentation. It can be seen that the table 300 contains a number 
of entries, including entries 301-306. Each entry contains the rating score for a 
particular combination of a query tenn and an . item identifier. For example, entry 302 
identifies die score **2r* for the term ''dynamics" the item idmtifier "1883823064**. It 
can be seen by examining entries 301-303 that, in query results produced fiom queries 

10 including the term ''dynamics", the item having item identifier "1883823064" has been 
selected by usgts more fi:equently than the item having item identifier "9676530409", 
and much more finequentiy than the item having item identifier "0801062272" In 
additional embodiments, the fiicility uses various other data structures to store the rating 
scores, such as sparse arrays. 

15 In augmenting the item rating table 300, the facility identifies the 

selection of the item having item identifier "1883823064" fix)m a query result produced 
by a query specifying the query terms **human" and "dynamics". Figure 4 shows the 
state of the item rating table after the it^ rating table is augmented by the fecility to 
reflect this selection. It can be seen by comparing entry 405 in item rating table 400 to 

20 entry 305 in item rating table 300 that the facility has incremented the scoro for this 
entry fiom "45" to "46". Similarly, the fiicility has incremented the rating scoro for this 
item identifier the term "dynamics" fiom "22" to "23". The fiidlity augments die rating 
table in a similar mdimex for the other selections fifom query results diat it identifies 
during die time period. 

25 Rather than generating a new rating table fiom scratch using tiie steps 

shown in Figure 2 each time new selection information becomes available, the facility 
preferably gen^utes and maintains separate rating tables for different constituent time 
periods, of a relatively short length, such as one day. Each time a rating table is 
generated for a new constituent time p^od, the facility preferably combines this new 

30 rating table with existing rating tables for earli^ constituent time periods to form a 
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rating table for a longer composite period of time. Figure 5 is a table diagram showing 
the generation of rating tables for composite periods of time from rating tables for 
constituent periods of time. It can be seen in Figure 5 that rating tables 501-506 each 
correspond to a single day between 8-Feb-98 and 13-Feb-98. Each time a new 
5 constituent period is completed, the facihty generates a new rating table reflecting the 
user selections made during that constituent period. For example, at tfie end of 
12-Feb-98, the fedlity generates rating table 505, which reflects all of the user 
selections occurring during 12-Fd>-98. After the fecility goierates a new rating table 
for a completed constituent period, the fecility also generates a new rating table for a 

10 composite period ending with that constituent period. For example, aft^ goierating the 
rating table 505 for tte constituent period 12-Feb-98, the &ciUty generates rating table 
515 for the composite period 8-Feb-98 to 12-Feb-98. The facility preferably generates 
such a rating table for a composite period by combining the entries of the rating tables 
for the constitumt periods making up the composite period, and combining the scores 

15 of corresponding entries, for example, by summing them. In one preferred 
embodiment, the scores and rating tables for more recent constituent periods are 
weiglhted more heavily than those in rating tables for less recent constituent periods. 
When ranking query results, the rating table for the most recent con^Kisite pmod is 
preferably used. Thiat is, until rating table 516 can be generated, die fadUty preferably 

20 uses rating table 515 to rank qu^ results. After rating table 516 is generated, the 
fecility preferably uses rating table 516 to rank queiy results. The lengths of both 
constituent periods and conqiosite periods are preferably configurable. 

Figure 6 is a table diagram showing a rating table for a composite period. 
By comparing the item rating table 600 shown in Figure 6 to item rating table 400 

25 shown in Figure 4, it can be seen that the contents of rating table 600 constitute the 
combination of the contents of rating table 400 with several other rating tables for 
constituent periods. For example, the score for entry 602 is "1 16"; or about five times 
the score for correi^onding entry 402. Further, although rating table 400 does not 
contain an entry for the term "dynamics** and the item identifier "1887650024**, entry 

30 607 has been added to table 600 for this combination of term and item identifier, as a 
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corresponding entry occurs in a rating table for one of the other constituent periods 
within the composite period. 

The process used by the facility to identify user selections is dependent 
upon both the kind of selection action used by the facility and the manner in which the 
5 data relating to such selection actions is stored. One preferred embodiment uses as its 
selection action requests to display more infonnation about items identified in query 
results. In this embodiment, the &cility extracts this infonnation fiom logs generated 
by a web s^er that generates query results for a usct using a wd> client, and allows the 
user to select an item with the web client in order display additional infonnation about 
10 it. A web server generally maintains a log detailing of all the HTTP requests that it has 
received firom wd> clients and responded to. Such a log is generally made up of entries, 
each containing information about a different HTTP request. Such logs are generally 
organized chronologically. Log Entry 1 below is a sample log mtry showing an HTTP 
request submitted by a web client on behalf of the user that submits a query. 

15 

1. Friday, 13 -Feb- 98 16:59:27 

2. User Identif ier=82707238671 

3 . HTTP_REFERER=ht tp : //www . amazoii . coin/book_query_j)age 
4 . PATH_INFO=/book_query 
20 5. author= ^Seagal'' 

6. title=*Humari Dynamics" 

Log Entry 1 

25 It can be seen by the occurrence of the keyword *1)ook_query" in the "PATHJNFO" 
line 4 of Log Entry 1 that this log entry corre^onds to a user's submission of a query. 
It further can be seen in tenn lines 5 and 6 that die query mcludes the terms "Seagal", 
•Human", and •^Dynamics". In hne 2, the entry further contains a user identifier^ 
corresponding to the idmtity of the user and, in some embodiments, also to this 

30 particular interaction with the web server. 

In response to receiving the HTTP request documented in Log Entry 1, ^ 
the query server generates a query result for the query and returns it to the web cUent 
submitting the query. Later the user selects an item identified in the query result, and 



wo 99/45487 PCTA;S98/26985 

12 



the web client submits another HTTP request to display detailed information about the 
selected item. Log Entry 2, which occurs at a point after Log Entiy 1 in the log, 
describes this second HTTP request 

5 1. Friday, 13 -Feb- 98 17:02:39 

2. User Identif ier=82707238671 

3 • HTTP_REFERER=ht tp : / /www . amazon . coTn/book_query 

4. PATH_INFO=/ISBN=1883823064 

10 Log Entry 2 

By comparing the user identifier in line 2 of Log Entry 2 to the user identifier in line 2 
of Log Entry 1, it can be seen that these log entries correspond to the same user and 
time firame. In the *TATH_INFO" line 4 of Log Entiy 2, it can be seen that tiie user has 

1 S selected an item having item identifier ('TSBN'O ''1 883823064". It can fiirther be seen 
Gom the occurrence of the keyword *lK)ok_query" on the *TnTP_REFERER- line 3 
tiiat the selection of this item was fit>m a quay result 

Where infonnation about user selections is stored in web server logs 
such as those discussed above, the fiicility preferably identifies user selections by 

20 traversing these logs. Such traversal can occur either in a batch processing mode after a 
log for a specific period of time has been completely generated, or in a real-time 
processmg mode so that log entries are processed as soon as they are geno^ed 

Figure 7 is a flow diagram showing the steps preferably performed by 
the facility in order to identify user selections within a web server log. In step 701, the 

25 facility positions a first pointer at the top, or beginning, of the log. The feciUty thai 
repeats steps 702-708 until the first points reaches the end of the log. In step 703, the 
facility traverses forward with the first pointer to the next item selection event In terms 
of tiie log mtiy shown above, stq> 703 involves traversing forward through log entries 
until one is found that contains in its *THnTP_REFERER" Une a keyword denoting a 

30 search entry, such as 'T>ookjjuery**. In step 704, ttie facility extracts firom this item 
selection event the identity of the item that was selected and session identifier that 
idCTtifies the user that selected the item. In temis of the log entries above, this involves 
reading the ten-digit number following the string *TSBN=" in the *TATH_INFO" line 
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of the log entry, and reading the user identifier from the "User Identifier^ line of flie log 
entry. Thus, in Log Entry 2, the facility extracts item identifier "1883823064" and 
session identifia: "82707238761*'. In step 705, the faciUty synchronizes the position of 
the second pointer with the position of the first pointer. That is, the facility makes the 
S second pointer point to the same log entry as the first pointer. In step 706, the facility 
traverses backwards with the second pointer to a query event having a matching user 
identifier. In terms of the log entries above, the facility travoses backward to the log 
entry having the keyword *1)ook_query" in its "PATH_INFO" line, and having a 
matching user identifier on its **User) Identifier'* line, hi step 707, the fsu^ility extracts 

10 fiom the quay event to which the second points points the terms of the query. In 
terms of the query log entries above, the facility extracts the quoted words fiiom the 
query log entry to which the second pointer points, in the lines after the "PATH_INFO" 
line. Thus, in Log Entry 1, the facility extracts the terms "Seagal", 'Human", and 
**Dynamics**. In step 708, if the first pointer has not yet reached the end of the log, then 

15 the facihty loops back to stq) 702 to continue processing the log, else these steps 
conclude. 

When other selection actions are used by the facility, extracting 
information about the selection bom the web server log can be somewhat more 
mvolved. For example, where the facility uses purchase of the item as tfie selection 

20 action, instead of identifying a log entry describing a request by the user for more 
information about an item, like Log Entry 1, the facility instead identifies a log entry 
describing a request to purchase items in a "shopping basket" The fedlity then 
traverses badcwards in the log, using the mtries desciibmg requests to add items to and 
remove items fiom the shopping basket to determine which items were in flie shopping 

25 basket at the time of the request to purchase. The faciUty then continues traversing 
backward in the log to identify the log mtry describing the query, like Log Entry 2, and 
to extract the search terms. 

Rather than relying solely on a web server log where item purchase is the 
selection action that is used by the facility, the facility altematively uses a database 

30 separate fiom the web servd: log to determine which items are purchased in each 
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purchase transaction. This infonnation from the database is then matched up with the 
log entry containing the query terms for the query fiom which item is selected for 
purchase. This hybrid approach, using the web server logs and a separate database, may 
be used for any of the different kinds of selection actions. Additionally, where a 

5 database separate fiom the web server log contains all the information necessary to 
augment the rating table» the facility may use the database exclusively, and avoid 
trav^ing the wd> server log. 

The &ciUty uses rating tables that it has generated to gmerate ranking 
values for items in new query results. Figure 8 is a flow diagram showing the steps 

10 preferably performed by the fiicility to order a query result using a rating table by 
gen^ting a ranking value for each item in the query result. In steps 801-807, the 
fiidlity loops through each item identified in the query result In step 802, the facility 
initializes a ranking value for the current item. In steps 803-805, the facility loops 
through each term occurring in the query. In step 804, the facility determines the rating 

15 score contained by the imost recently-generated rating table for the current term and 
item. In step 805, if any terms of the query remain to be processed, then the facihty 
loops up to step 803, else the fiwality continues in step 806. In step 806, the facihty 
combines the scores for the current item to gmerate a ranking value for the item. Asan 
example, with reference to Figure 6, in processing datum having item identifier 

20 ''1883823064", the facihty combines the score ^'Uff' extracted firom entry 602 for this 
item and the term ''dynamics", and the score "211" extracted fiom entry 605 for ttiis 
item and the term liuman". Step 806 pTefend)ly involves summing these scores. These 
scores may be combined in other ways, however. In particular, scores may be adjusted 
to more directly reflect the number of query terms that are matched by the item, so that 

25 items that match more query terms than oth^ are favored in the ranking. In step 807, 
if any items remain to be processed, the facihty loops back to step 801 to process the 
next item, else the facihty continues in step 808. In step 808, the facility displays the 
items identified in the query result in accordance with the ranking values generated for 
the items in step 806. Step 808 preferably involves sorting the items in the query result 

30 in decreasing order of their ranking values, and/or subsetting the items in the query 
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result to include only those items above a threshold ranking value, or only a 
predetermined number of items having the highest ranking values. After step 808, these 
steps conclude. 

Figure 9 is a flow diagram showing the steps preferably performed by 
5 the fiicility to select a few items in a query result having the highest ranking values 
using a rating table. In steps 901-903, the facility loops through each term in the queiy. 
In step 902, the facility identifies among fbe table entries for the current tenn and those 
entries having the three highest rating scores. For exanq>le, with reference to Figure 6, 
if fte only entries in item rating table 600 for the term ^'dynamics*' are entries 601, 602, 

10 603, and 607, the &cility would identify entries 601, 602, and 603, which are the entries 
for the term ''dynamics" having the three higihest rating scores. In additional preferred 
embodiments, a small number of table entries other than three is used. In step 903, if 
additional terms remain in the query to be processed, then the facility loops back to step 
901 to process the next term in the query,' else the facility continues in step 904. In 

15 steps 904-906, the facility loops through each unique item among the identified entries. 
In step 905, the facility combines all of the scores for the item among the identified 
entries. In step 906, if additional imique items remain among the identified entries to be 
processed, then the facility loops back to step 904 to process the next unique item, else 
the facility continues in step 907. As an example, i^ in item rating table 600, the 

20 fecility selected entries 601, 602, and 603 for the term ''dynamics", and selected entries 
604, 605, and 606 for the term liuman", then the fiicility would combine the scores 
"116" and "211" for the item having item identifier "1883823064", and would use the 
following single scores for the remaining item identifiers: "77* for the item having 
item identifier "0814403484", "45" for the item having item identifier "9676530409", 

25 "12'* for the item having item identifier "6303702473", and "4" for the item having item 
identifier "0801062272". In step 907, the facility selects for prominent display items 
having the top three combined scores. In additional embodimmts, the facility selects a 
small number of items having the top combined scores that is other than three. In the 
example discussed above, the faciUty would select for prominent display the items 

30 having item identifiers "1883823064", '^0814403484", and "9676530409". Because the 
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facility in step 907 selects items without regard for their presoice in the query result, the 
facility may select items that are not in the query result. This aspect of this embodiment 
is particularly advantageous in situations in which a complete query result is not 
available when the facihty is invoked. Such as the case, for instance, where the query 
S server only provides a portion of the items satisfying the query at a time. This aspect of 
the invention is further advantageous in that, by selecting items without regard for their 
presence ufi the query result, tfie facility is able to select and display to the user items 
relating to the query even where the query result is empty, i.^., when no items 
completely satisfy the query. After step 907, these steps conclude. 

10 While the present invention has been shown and described widi 

reference to prefenred embodiments, it will be understood by those skilled in the art that 
various changes or modifications in form and detail may be made without departing 
fiom the scope of die invmtion. For example, the facihty may be used to rank query 
results of all types. The facihty may use various formulae to determine in the case of 

15 each item selection, the amount by which to augment rating scores with respect to the 
selection. Further, the facihty may employ various formulae to combine rating scores 
into a ranking value for an item. The facihty may also use a variety of differmt kinds 
of selection actions to augment Hie rating table, and may augment the rating table for 
more than one kind of selection action at a time. Additionally, the facility, may 

20 augment the rating table to reflect selections by users other than human users, such as 
software agents or other types of artificial users. 
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CLAIMS 

We claim: 



1 LA method in a computer system for ranking items in a search 

2 result, the method comprising the steps of: 

3 for each of a multiplicity of search terms, compiling data indicating the 

4 extent to which users have selected each of a multiplicity of items when returned in 

5 search results produced fiom queries containing the search temi; 

6 receiving a query and a search result, the received query containing a 

7 tenn among the multiplicity of tenns, the received search result identifying a plurality 

8 of items among the multiplicity of items that satisfy the received qu^; and 

9 using the compiled data to rank at least a portion of the items identified 



10 in the received search result in accordance with the extent to which users have selected 

11 each of the plurality of items identified in the received search result when retumed in 

12 search results produced fix>m queries containing the search term contained in the 

13 received query. 

1 2. The method of claim 1 wherein at least a portion of the users are 

2 identified with one of a plurality of demogr^hic groups, and wherein the compiling 

3 step compiles, for each demographic group of the plurality of demogrq)hic groups, data 

4 indicating the extent to which us^ identified with the demogr^hic group have 

5 selected eadi of a multiplicity of items when retumed in search results produced firom 

6 queries containing the search term, and wherein the received quay is suhmitted on 

7 behalf of a distinguished user identified with a distinguished demognphic group, and 

8 wherein the ranking step uses the compiled data to rank the it^s identified in flie 

9 received search result in accordance with the extent to which users identified with the 

10 distinguished demographic groiq) have selected each of the pluraUty of items identified 

11 in the received search result when retumed in search results produced fix>m queries 

12 containing the search temi contained in the received query. 
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1 3. The method of claim 1, fbrtfaer comprising the stq> of imposing 

2 on the items identified in the distinguished search result an order in which the ranking 

3 values of the items monotonically decreases. 

1 4. The method of claim 1, further comprising the step of using the 

2 ranking values of the items identified in the distinguished search result to create a 

3 proper subset of the items. 

1 5. The method of claim 4 wherem the creating step creates a subset 

2 of the items identified in the distinguished search result that contains all of flie items 

3 whose ranking values exceed a predetemuned nunimum ranking value. 

1 6. The method of claim 4 wherein the creating step creates a subset 

2 of the items identified in the distinguished search result that contains all of the items 

3 whose ranking values exceed a predetermined minimum ranking value. 

1 7. The method of claim 1 wherein the increasing step increases 

2 rating values for selections made to display additional infonnation about items. 

1 8. The method of claun 1 wherein the increasing step increases 

2 rating values for selections made to purchase items. 

1 9. The method of claim 1 wherein the increasing step increases 

2 rating values for selections made to add items to a tmtative list of purchases. 

1 10. The method of claim 1 wherein the increasing step increases f 

2 rating values for selections of portions of detailed information displayed about items. 
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1 11. The method of claim 1 wherem the increasing step increases 

2 rating values for units of time for which the user displays detailed information about 

3 items. 

1 12. A computer-readable medium whose contents cause a computer 

2 system to rank items in a search result by performing the stq>s of: 
' 3 receiving a query specifying one or more tmns; 

4 generating a queiy result identifying a plurality of items satisfying the 

5 qu^; and 

6 for each item idratilBed in the queiy result, combining the relative 

7 frequencies with which users selected tiie item in earlier queries specifying each of the 

8 tenns of the query to producing a ranking value for the item. 

1 13. The computer-readable medium of claim 12 wherein the contents 

2 of the computer-readable medium furthCT cause the computer system to perform the step 

3 of adjusting the ranking value produced for each item identified in the query result to 
, 4 reflect the number of terms specified by the quay that are matched by the item. 

1 14. A compute system for ranking items in a search result, 

2 comprising: 

3 a query memory that stores information about previously submitted 

4 queries and items selected fifom the query results of previously submitted queries; 

5 a query receiver that recdves queries each specifying one or more terms; 

6 a query saver that generating a query result for each query received by 

7 the query receive that identifies a plurality of items satisfying the query; and 

8 an item ranking subsystem that, for each query result generated by the 

9 query server, for at least a portion of the items identified in the query result, combines 

10 fix)m the contrats of the query memory the relative ftequencies with which usws 

11 selected the item in earlier queries ^ecifying each of the terms of the query to 

12 producing a ranking value for the item. 
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1 15. A computer memory containing a user behavior data structure 

2 usable to rank the relevance of items in a query result, the data structiu^ comprising a 

3 plurality of rating scores, each rating score corresponding both to a query term and to an 

4 item, and reflecting quantitatively the extent to which users have selected the item from 

5 query results generated Jx)m queries specifying the query term, such that the data 

6 structure may be used to rank items in a distinguished query result produced for a 

7 distinguished query by, for each item in the distinguished query result, retrieving from 

8 the data structure the rating scores corresponding to the it^ and any term specified in 

9 tiie distinguished query and combining the retrieved rating scores to generate a ranking 
10 value for the item. 
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